The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the chal-
translated by 谷歌翻译
This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.
translated by 谷歌翻译
Motivated by the increasing application of low-resolution LiDAR recently, we target the problem of low-resolution LiDAR-camera calibration in this work. The main challenges are two-fold: sparsity and noise in point clouds. To address the problem, we propose to apply depth interpolation to increase the point density and supervised contrastive learning to learn noise-resistant features. The experiments on RELLIS-3D demonstrate that our approach achieves an average mean absolute rotation/translation errors of 0.15cm/0.33\textdegree on 32-channel LiDAR point cloud data, which significantly outperforms all reference methods.
translated by 谷歌翻译
Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net. Evaluations of saliency methods convert this heat map into a new {\em masked input} by retaining the $k$ highest-ranked pixels of the original input and replacing the rest with \textquotedblleft uninformative\textquotedblright\ pixels, and checking if the net's output is mostly unchanged. This is usually seen as an {\em explanation} of the output, but the current paper highlights reasons why this inference of causality may be suspect. Inspired by logic concepts of {\em completeness \& soundness}, it observes that the above type of evaluation focuses on completeness of the explanation, but ignores soundness. New evaluation metrics are introduced to capture both notions, while staying in an {\em intrinsic} framework -- i.e., using the dataset and the net, but no separately trained nets, human evaluations, etc. A simple saliency method is described that matches or outperforms prior methods in the evaluations. Experiments also suggest new intrinsic justifications, based on soundness, for popular heuristic tricks such as TV regularization and upsampling.
translated by 谷歌翻译
Airbnb is a two-sided marketplace, bringing together hosts who own listings for rent, with prospective guests from around the globe. Applying neural network-based learning to rank techniques has led to significant improvements in matching guests with hosts. These improvements in ranking were driven by a core strategy: order the listings by their estimated booking probabilities, then iterate on techniques to make these booking probability estimates more and more accurate. Embedded implicitly in this strategy was an assumption that the booking probability of a listing could be determined independently of other listings in search results. In this paper we discuss how this assumption, pervasive throughout the commonly-used learning to rank frameworks, is false. We provide a theoretical foundation correcting this assumption, followed by efficient neural network architectures based on the theory. Explicitly accounting for possible similarities between listings, and reducing them to diversify the search results generated strong positive impact. We discuss these metric wins as part of the online A/B tests of the theory. Our method provides a practical way to diversify search results for large-scale production ranking systems.
translated by 谷歌翻译
随着未来以数据为中心的决策,对数据库的无缝访问至关重要。关于创建有效的文本到SQL(Text2SQL)模型以访问数据库的数据有广泛的研究。使用自然语言是可以通过有效访问数据库(尤其是对于非技术用户)来弥合数据和结果之间差距的最佳接口之一。它将打开门,并在精通技术技能或不太熟练的查询语言的用户中引起极大的兴趣。即使提出或研究了许多基于深度学习的算法,在现实工作场景中使用自然语言来解决数据查询问题仍然非常具有挑战性。原因是在不同的研究中使用不同的数据集,这带来了其局限性和假设。同时,我们确实缺乏对这些提议的模型及其对其训练的特定数据集的局限性的彻底理解。在本文中,我们试图介绍过去几年研究的24种神经网络模型的整体概述,包括其涉及卷积神经网络,经常性神经网络,指针网络,强化学习,生成模型等的架构。我们还概述11个数据集,这些数据集被广泛用于训练Text2SQL技术的模型。我们还讨论了无缝数据查询中文本2SQL技术的未来应用可能性。
translated by 谷歌翻译
最近,由于许多用例的性能要求严格的性能要求,基于意图的管理正在受到电信网络的良好关注。文献上的几种方法采用电信域中的传统方法来满足KPI的意图,可以将其定义为封闭环。但是,这些方法考虑了每个闭环相互独立的环路,从而降低了组合的闭环性能。同样,当需要许多闭环时,这些方法不容易扩展。在许多领域,多机构增强学习(MARL)技术在许多领域都表现出了巨大的希望,在许多领域中,传统的闭环控制效果不足,通常用于循环之间的复杂协调和冲突管理。在这项工作中,我们提出了一种基于MARL的方法,以实现基于意图的管理,而无需基础系统模型。此外,当存在相互矛盾的意图时,MARL代理可以通过优先考虑重要的KPI来暗中激励循环,而无需人工互动。已经在网络模拟器上进行了实验,以优化三种服务的KPI,我们观察到拟议的系统的性能良好,并且在资源不足或资源稀缺时能够实现所有现有的意图。
translated by 谷歌翻译
当代企业在以不确定性,敌意和纯粹的数据量为特征的情况下广泛应用机器推理和人工智能提供了无限的机会。该论文开发了一个评估网络,作为在支持人类运营商的不确定性下进行高级融合和推理的图形系统。估值是(不确定的)知识和收集数据的数学表示形式,被表示为信用集,定义为不精确概率理论框架中的连贯间隔概率。具有这种信用集,组合和边缘化的基本操作被定义为满足评估代数的公理。讨论了信用估值网络的实际实施,并在一个小规模的示例中证明了其实用性。
translated by 谷歌翻译
作为理解过度参数模型中梯度下降的隐式偏差的努力的一部分,有几个结果表明,如何将过份术模型上的训练轨迹理解为不同目标上的镜像。这里的主要结果是在称为通勤参数化的概念下对这种现象的表征,该概念涵盖了此设置中的所有先前结果。结果表明,具有任何通勤参数化的梯度流相当于具有相关Legendre函数的连续镜下降。相反,具有任何legendre函数的连续镜下降可以被视为具有相关通勤参数化的梯度流。后一个结果依赖于纳什的嵌入定理。
translated by 谷歌翻译
图形上的分层聚类是数据挖掘和机器学习中的一项基本任务,并在系统发育学,社交网络分析和信息检索等领域中进行了应用。具体而言,我们考虑了由于Dasgupta引起的层次聚类的最近普及的目标函数。以前(大约)最小化此目标函数的算法需要线性时间/空间复杂性。在许多应用程序中,底层图的大小可能很大,即使使用线性时间/空间算法,也可以在计算上具有挑战性。结果,人们对设计只能使用sublinear资源执行全局计算的算法有浓厚的兴趣。这项工作的重点是在三个经过良好的sublinear计算模型下研究大量图的层次聚类,分别侧重于时空,时间和通信,作为要优化的主要资源:(1)(动态)流模型。边缘作为流,(2)查询模型表示,其中使用邻居和度查询查询图形,(3)MPC模型,其中图边缘通过通信通道连接的几台机器进行了分区。我们在上面的所有三个模型中设计用于层次聚类的sublinear算法。我们算法结果的核心是图表中的剪切方面的视图,这使我们能够使用宽松的剪刀示意图进行分层聚类,同时仅引入目标函数中的较小失真。然后,我们的主要算法贡献是如何在查询模型和MPC模型中有效地构建所需形式的切割稀疏器。我们通过建立几乎匹配的下限来补充我们的算法结果,该界限排除了在每个模型中设计更好的算法的可能性。
translated by 谷歌翻译